Find below an internally generated list of new and/or notable data ecosystem companies and open source projects. Each company is contextualized with open source engagement data, along with general traffic and search interest data, all where available.

Note: This tracker is a WIP, and will be updated and expanded regularly.

Data Ecosystem

> Workflow Tools

> Metadata Management

> Query Engines

> Data Quality

> Data Transform Tools

> ML Ops

> ML Monitoring

> Service Catalogs

All Companies

Flyte


Homepage: https://flyte.org/
Category: Workflow
Github: https://github.com/lyft/flyte
Status: In-house Open Source Project (Lyft)

Marquez


Homepage: https://marquezproject.github.io/marquez/
Category: Metadata Management
Github: https://github.com/MarquezProject
Status: Open Source Project

Superwise


Homepage: https://www.superwise.ai/
Category: ML Monitoring
Github:
Status: Standalone Company

Prefect


Homepage: https://www.prefect.io/
Category: Workflow
Github: https://github.com/PrefectHQ
Status: Standalone Company

Ray


Homepage: https://ray.io/
Category: Query Engine
Github: https://github.com/ray-project
Status: Open Source Project

Dbt


Homepage: https://www.getdbt.com/
Category: Data Transform
Github:
Status: Standalone Company

Mona labs


Homepage: https://www.monalabs.io/
Category: ML Monitoring
Github:
Status: Standalone Company

Dataform


Homepage: https://dataform.co/
Category: Data Transform
Github: https://github.com/dataform-co
Status: Standalone Company

Seldon


Homepage: https://www.seldon.io/
Category: ML Ops
Github: https://github.com/SeldonIO
Status: Standalone Company

Arthurai


Homepage: https://www.arthur.ai/
Category: ML Monitoring
Github:
Status: Standalone Company

Databand


Homepage: https://databand.ai/
Category: Data Quality
Github: https://github.com/databand-ai/
Status: Standalone Company

Soda


Homepage: https://www.soda.io/
Category: Data Quality
Github: https://github.com/sodafoundation
Status: Standalone Company

Proximo


Homepage: https://www.proximo.com/
Category: Data Quality
Github:
Status: Standalone Company

Fiddler ai


Homepage: https://www.fiddler.ai/
Category: ML Monitoring
Github: https://github.com/fiddler-labs
Status: Standalone Company

Bentoml


Homepage: https://www.bentoml.ai/
Category: ML Ops
Github: https://github.com/bentoml
Status: Standalone Company

Mlflow


Homepage: https://mlflow.org/
Category: ML Ops
Github: https://github.com/mlflow
Status: Open Source Project

Opslevel


Homepage: https://www.opslevel.com/
Category: Service Catalogs
Github:
Status: Standalone Company

Monte carlo


Homepage: https://www.montecarlodata.com/
Category: Data Quality
Github:
Status: Standalone Company

Grid.ai


Homepage: https://www.grid.ai/
Category:
Github: https://github.com/PyTorchLightning
Status: Standalone Company

Dagster


Homepage: https://dagster.io/
Category: Workflow
Github: https://github.com/dagster-io
Status: Open Source Project

Pachyderm


Homepage: https://www.pachyderm.com/
Category: ML Ops
Github: https://github.com/pachyderm
Status: Standalone Company

Toro


Homepage: https://torodata.io/
Category: Data Quality
Github:
Status: Standalone Company

Presto


Homepage: https://prestodb.io/
Category: Query Engine
Github: https://github.com/prestodb
Status: Open Source Project

Backstage


Homepage: https://backstage.io/
Category: Service Catalogs
Github: https://github.com/backstage/backstage
Status: In-house Open Source Project (Spotify)

Spinnaker


Homepage: https://spinnaker.io/
Category: Service Catalogs
Github: https://github.com/spinnaker
Status: Open Source Project

Determined.ai


Homepage: https://determined.ai/
Category: ML Ops
Github: https://github.com/determined-ai
Status: Standalone Company

Snorkel


Homepage: https://www.snorkel.org/
Category: ML Ops
Github: https://github.com/snorkel-team
Status: Open Source Project

Datafold


Homepage: https://www.datafold.com/
Category: Data Quality
Github:
Status: Standalone Company

Amundsen


Homepage: https://www.amundsen.io/
Category: Metadata Management
Github: https://github.com/amundsen-io
Status: In-house Open Source Project (Lyft)

Deepchecks


Homepage: https://www.deepchecks.com/
Category: ML Monitoring
Github:
Status: Standalone Company

Datahub


Homepage: https://engineering.linkedin.com/blog/2019/data-hub
Category: Metadata Management
Github: https://github.com/linkedin/datahub
Status: In-house Open Source Project (LinkedIn)

Monitor ml


Homepage: https://monitorml.com/
Category: ML Monitoring
Github:
Status: Standalone Company

Clutch


Homepage: https://clutch.sh/
Category: Service Catalogs
Github: https://github.com/lyft/clutch
Status: In-house Open Source Project (Lyft)

Metaflow


Homepage: https://metaflow.org/
Category: Workflow
Github: https://github.com/Netflix/metaflow
Status: In-house Open Source Project (Netflix)

Starburst


Homepage: https://www.starburstdata.com/
Category: Query Engine
Github: https://github.com/starburstdata
Status: Standalone Company

Airflow


Homepage: https://airflow.apache.org/
Category: Workflow
Github: https://github.com/apache/airflow
Status: Open Source Project

Druid


Homepage: https://druid.apache.org/
Category: Query Engine
Github: https://github.com/apache/druid
Status: Open Source Project

Pinot


Homepage: https://pinot.apache.org/
Category: Query Engine
Github: https://github.com/apache/incubator-pinot
Status: Open Source Project

Spark


Homepage: https://spark.apache.org/
Category: Query Engine
Github: https://github.com/apache/spark
Status: Open Source Project

Databook


Homepage: https://eng.uber.com/databook/
Category: Metadata Management
Github:
Status: In-house Open Source Project (Uber)